145 research outputs found
Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation
Convolutional neural networks have been widely deployed in various
application scenarios. In order to extend the applications' boundaries to some
accuracy-crucial domains, researchers have been investigating approaches to
boost accuracy through either deeper or wider network structures, which brings
with them the exponential increment of the computational and storage cost,
delaying the responding time. In this paper, we propose a general training
framework named self distillation, which notably enhances the performance
(accuracy) of convolutional neural networks through shrinking the size of the
network rather than aggrandizing it. Different from traditional knowledge
distillation - a knowledge transformation methodology among networks, which
forces student neural networks to approximate the softmax layer outputs of
pre-trained teacher neural networks, the proposed self distillation framework
distills knowledge within network itself. The networks are firstly divided into
several sections. Then the knowledge in the deeper portion of the networks is
squeezed into the shallow ones. Experiments further prove the generalization of
the proposed self distillation framework: enhancement of accuracy at average
level is 2.65%, varying from 0.61% in ResNeXt as minimum to 4.07% in VGG19 as
maximum. In addition, it can also provide flexibility of depth-wise scalable
inference on resource-limited edge devices.Our codes will be released on github
soon.Comment: 10page
Semantic-based Pre-training for Dialogue Understanding
Pre-trained language models have made great progress on dialogue tasks.
However, these models are typically trained on surface dialogue text, thus are
proven to be weak in understanding the main semantic meaning of a dialogue
context. We investigate Abstract Meaning Representation (AMR) as explicit
semantic knowledge for pre-training models to capture the core semantic
information in dialogues during pre-training. In particular, we propose a
semantic-based pre-training framework that extends the standard pre-training
framework (Devlin et al., 2019) by three tasks for learning 1) core semantic
units, 2) semantic relations and 3) the overall semantic representation
according to AMR graphs. Experiments on the understanding of both chit-chats
and task-oriented dialogues show the superiority of our model. To our
knowledge, we are the first to leverage a deep semantic representation for
dialogue pre-training.Comment: Accepted as oral in COLING202
Coordinated Reasoning for Cross-Lingual Knowledge Graph Alignment
Existing entity alignment methods mainly vary on the choices of encoding the
knowledge graph, but they typically use the same decoding method, which
independently chooses the local optimal match for each source entity. This
decoding method may not only cause the "many-to-one" problem but also neglect
the coordinated nature of this task, that is, each alignment decision may
highly correlate to the other decisions. In this paper, we introduce two
coordinated reasoning methods, i.e., the Easy-to-Hard decoding strategy and
joint entity alignment algorithm. Specifically, the Easy-to-Hard strategy first
retrieves the model-confident alignments from the predicted results and then
incorporates them as additional knowledge to resolve the remaining
model-uncertain alignments. To achieve this, we further propose an enhanced
alignment model that is built on the current state-of-the-art baseline. In
addition, to address the many-to-one problem, we propose to jointly predict
entity alignments so that the one-to-one constraint can be naturally
incorporated into the alignment prediction. Experimental results show that our
model achieves the state-of-the-art performance and our reasoning methods can
also significantly improve existing baselines.Comment: in AAAI 202
- …